DESCRIPTION SCHEME AND BROWSING METHOD FOR AUDIO/VIDEO 
SUMMARY 

BACKGROUND OF THE INVENTION 
5 Field of the Invention 

The present invention relates to description scheme 
and browsing method of summary data (outline) of compressed 
or uncompressed audio/video data (audio data, or video data, 
or audiovisual data ) , and particularly to description scheme 

10 and browsing method of summary data as feature data to be 
attached to audio/video data. The invention also relates 
to description scheme and browsing method of summary data 
of audio/video data capable of presenting fast and advanced 
browsing of audio/video data by presenting continuously 

15 small segments or frames of audio and video data arranged 
sequentially. 

Description of the Related Art 

The feature description of audio and video data is being 
standardized in MPEG-7 (Moving Picture Experts Group phase 
20 7) of ISO and IEC at the present. In MPEG-7 , in order to 
search efficiently compressed or uncompressed audio and 
video data, content descriptors, description schemes, and 
description definition language are being standardized. 
In MPEG-7, feature description is being standardized 
25 from various viewpoints, and among those items, relating 
to the summary description for enabling fast and efficient 
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browsing of audiovisual data, a description scheme is 

specif ied where temporal positions or file names of key clips 

are described sequentially in the feature description file. 

Though the key clip determination is out of standard, it 
5 can be performed, for example, by semantically dividing the 

audiovisual data into shots or scenes, and extracting 

significant images (i.e. key frames) that represent shots. 
On the application side, for example, by presenting 

them continuously at specific or arbitrary temporal segments, 
10 fast browsing like slide show is enabled, and the summary 

of audio and video data can be presented. Such summary is 

called slide summary hereinafter. 

Referring to Fig. 8 and Fig. 9, a conventional 

description scheme of slide summary data is explained. Fig. 
15 8 is an example of a method for composing a slide summary 

of a certain media file P and describing the slide summary. 

First, when a media file of audio or video (assuming video 

herein) P is entered, a first shot or scene is defined by 

a shot or scene detection algorithm ( step S5 1 ) . By applying 
20 a key frame detection algorithm for this shot or scene, the 

key frame in the first shot or scene is determined (step 

S52) . 

The pos ition of the determined key frame in the original 
media file is described in the slide summary description 
25 file as the "media time" by the frame number or time code 
from the file beginning (step S52 ' ) . While, in the slide 



summary description file, a slide component header is 
described at the beginning of each slide component (step 
S51'). Optionally, when saving the determined key frame 
as external file (step S53), the saved key frame file name 
5 is described in the slide summary description file as "slide 
component file name" (step S53 ' ) . 

This is the procedure for describing the s 1 ide component 
for the first shot or scene, and this procedure is repeated 
to the final shot or scene of the media file P. To reduce 
10 the number of slide components, when detecting the shot or 
scene in the media file P at step S51, temporal sub-sampling 
may be applied. 

Fig. 9A and Fig. 9B show examples of slide summary in 
the conventional slide summary description shown in Fig. 
15 8. As shown in Fig. 9A and Fig. 9B, in the original content 

61, scene 1, scene 2, scene 3, are defined from the 

beginning, and each original segment 62 is supposed to be 
defined as time code. Slide components 63 are given as time 
code or external file name for each scene as shown in Fig. 
20 9A and Fig. 9B. Time codes in the original content 61 are 
described as "media time" in the slide components 63. 

In this case, an actual example of description of slide 
summary is shown in Fig. 9C. The slide summary of content 
is first displayed continuously and sequentially as 
25 KeyFramel, KeyFrame2 , KeyFrame3 , ... and so forth. As the 
display duration of each slide component, a specific time 



may be selected, or the time proportional to the duration 
of each scene may be assigned, or the time determined by 
the preset priority of the scene may be assigned. 

Thus, in the prior art, the data showing the slide 
5 component belongs to which part of the original content is 
described, but there is no framework for describing the 
temporal segments of the scenes to which the slide components 
belong. 

Of the conventional feature descriptions about the 

10 audio and video data, in the slide summary description, even 
in the case of audio and video data, only the visual data 
is specified in the form of key frames or others. For example, 
concerning the audio portion of audiovisual data, or the 
music data as data of audio only, nothing is specified about 

15 sequential description of the element corresponding to the 
key frame (for example, key audio clip). 

As for the description scheme for describing the key 
frame as the slide component, the temporal position of the 
corresponding key frame in the original audio and video data 

20 can be described, but there exists no link to the temporal 
position in the original content from the slide component, 
such as transition to the shot in which the key frame is 
included, for example, from the key frame displayed as slide . 
Also, in the case of multiple media files regarded to be 

25 one content, similarly, there is no link for specifying the 
location of the original media file or file name from the 
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slide component. 

SUMMARY OF THE INVENTION 

It is hence an object of the invention to present a 
5 description scheme of summary data of audio /video data and 
a browsing method, in description of slide summary of slide 
components comprising part (small segments or frames) of 
single or multiple audiovisual content ( s ) , capable of 
transferring to the corresponding original content during 

10 playback of a certain slide, for example, by adding the 
description relating to the temporal position or location 
(file name) of the original content for specifying the link 
to the content of the original from the slide components. 
In order to accomplish the object, the invention is 

15 firstly characterized in a description scheme of summary 
data of at least one of audio data , video data , and audiovisual 
data (hereinafter called audio /video ) , wherein an 
audio/video slide is composed of singleor multiple important 
portions of its content, relating to single or multiple 

20 compressed or uncompressed audio/video content(s), slide 
components of the audio/video slide are described 
sequentially, and this description includes the description 
about the link between the original audio/video contents 
and the slide components. 

25 The invention is secondly characterized in a browsing 

method using the summary data of audio /video, wherein it 
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is possible to transfer from playback of the audio/video 
slide to playback of the original audio/video content 
relating to the slide components of the audio/video slide, 
and it is also possible to transfer reversely from playback 
5 of original audio/video content to playback of audio/video 
slide. 

According to the invention, concerning single or 
multiple audio and video contents, key audio or video clips 
belonging to them are used as slide components, and a slide 
10 summary arranging them sequentially is described 

efficiently, so that audio and video data can be browsed 
at high speed. Besides, by describing the link from the 
slide summary to the original content, an advanced slide 
summary can be composed. 

15 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a block diagram (single file) showing an 
example of slide summary composition in a first embodiment 
of the invention; 
20 Fig. 2A to C are diagrams showing the slide summary 

and its description examples in the slide summary compos ition 
shown in Fig. 1; 

Fig. 3 is a flowchart showing browsing operation of 
the embodiment; 

25 Fig. 4 is a block diagram (multiple files) showing an 

example of slide summary composition in a second embodiment 



of the invention; 

Fig. 5A through C are diagrams showing the slide summary 
and its description examples in the slide summary composition 
shown in Fig. 4; 
5 Fig. 6A through C are diagrams showing other slide 

summary and its description examples in the slide summary 
composition shown in Fig. 4; 

Fig. 7 is a diagram showing various operations during 
slide summary playback realized by the invention; 
10 Fig. 8 is a block diagram showing an example of slide 

summary composition in a prior art; and 

Fig . 9A through C are diagrams showing the slide summary 
and its description examples in the slide summary composition 
shown in Fig. 8. 

15 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

Referring now to the drawings, the invention is 
described in detail below. Fig. 1 shows a first embodiment 
of a slide summary composition by the slide summary 
20 description according to the invention. It is a feature 
of this embodiment that 7 concerning a single original 
audio /video (audio data, or video data, or audiovisual data ) 
content, the description about the temporal segment in the 
original content is added to the description of slide 
25 components of the audio/video slide. 

Same as in Fig. 8, when compressed or uncompressed 
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single media file of audio or video (assuming audio herein) 
is entered, the first shot or scene is defined by the audio 
shot or scene detection algorithm (step Si). In the 
embodiment, as clearly shown in Fig. 2C below, the position 
5 of this shot or scene in the original media file is described 
in the slide summary description file as the "media location " 
by the time code from the beginning of the file and duration, 
that is, as the description about the temporal segment (step 
SI'). However, in the slide summary description file, a 

10 slide component header is described at the beginning of each 
slide component (step SO'). 

By applying a key clip detection algorithm to this shot 
or scene, the key clip (the important clip) in the first 
shot or scene is determined (step S2 ) . The position of the 

15 determined key clip in the original media file is described 
in the slide summary description file as the "media time" 
by the time code from the beginning of the file or others 
(step S2' ) . 

Optionally, when saving the determined key clip as 
20 external file (step S3), the saved key clip file name is 
described in the slide summary description file as "slide 
component file name" (step S3'). As an example of saving 
in external file, it is assumed to encode at higher 
compression rate or decrease the sampling frequency in order 
25 to reduce the size of the file as the slide component. In 
the case of audiovisual data, meanwhile, only the audio 



portion may be saved as external file. 

This is the procedure for describing the s 1 ide component 
for the first shot or scene, and this procedure is repeated 
up to the final shot or scene of the media file ( a ) . Meanwhile , 
detection of shot or scene and determination of key clip 
can be done either automatically or manually, or both. In 
the explanation above, the description about the temporal 
segments in the original content is added to the description 
of the slide components of audio/video slide, but the 
separate files may be added instead of the temporal segments . 

Fig. 2A through C show examples of slide summary by 
slide summary description of the invention . In the original 
content 1, first movement, second movement, third movement 
and forth are defined from the beginning, and the segment 
2 in the original content is defined as time code as shown 
in Fig. 2A and Fig. 2B. The slide component 3 is given as 
time code to each scene as shown in Fig. 2A and B. However, 
the slide component 3 may be also specified as an external 
file. 

In these slide components 3, the time code of the 
original content to which the slide components belong (in 
this example, each movement) is described as "media 
location." Fig. 2c shows an example of actual description 
of slide summary . The slide summary of the content is usually 
played cont inuously and sequentially as 01:30 to 01:45, 07:00 
to 07:20, 12:20 to 13:00, . . . , in normal situation, but when 



transition to the original content is signaled during 
playback of a certain slide component (for example, 07:00 
to 07:20), it is transferred to the time code indicated in 
the original segment described as media location ( see arrow 
5 p) , and the corresponding original segment ( second movement ) 
can be played. Also, during playback of original contents, 
if transition to slide summary is signaled again, or when 
the playback of the original segment is terminated, the 
playback of the slide summary is started again from the slide 

10 described next to the slide summary at the origin of 
transition (see arrow q) . 

Fig. 3 is a flowchart showing the detail of the above 
browsing operation. While the slide component of the 
content is being played in the cycle of steps Sll , S12 , S13 , 

15 when playback of original content is signaled at step S12, 
going to step S14 to transfer to the beginning of the original 
segment corresponding to the slide component being played, 
playback is started from the beginning of the segment of 
the original content at step S15 . When playback of the slide 

20 component is signaled during playback of the original content 
(affirmative at step S16) , going to step S18, the operation 
is transferred to playback of next slide component. When 
the playback of the segment of the original content is 
terminated (affirmative at step S17), going to step S18, 

25 the operation is transferred to playback of next slide 
component. Thus, according to the embodiment, transition 



is possible from playback of slide component to the 
corresponding segment of the original content. When stop 
of playback is signaled at step SI 9, the browsing operation 
is terminated. 

Fig. 4 shows a second embodiment of slide summary 
composing method by slide summary description according to 
the invention. It is a feature of this embodiment that, 
concerning multiple original audio/video contents, the 
description about the identifier of the original contents 
to which the slide component belongs is added, to the 
description of the slide components of the audio/video slide . 

That is, what differs from the first embodiment (Fig. 
1, Fig. 2) is that there are multiple media files of audio 
and/or video to be described. When the media file group 
(b) (assuming audio herein) is entered, the media file names 
are described in the slide summary description file as the 
"media location," that is, as the description relating to 
the identifier of the original contents (stepSll'). While, 
in the slide summary description file, the slide component 
header is described at the beginning of each slide component 
(step S10' ) . 

Next, similar to Fig. 1, by applying a key clip detection 
algorithm to each file, the key clip in the first file is 
determined (step S12 ) . The key clip can also be determined 
manually. The position of the determined key clip in the 
original media file is described in the slide summary 



description file as the "media time" by the time code from 
the beginning of the file or others (step S12'). 

Optionally, when saving the determined key clip as 
external file (step S13), the saved key clip file name is 
described in the slide summary description file as "slide 
component file name" (step S13'). This is the procedure 
for describing the slide components in the first media file , 
and this procedure is repeated for all entered media files . 

Fig. 5A through C show specific examples of slide 
summary description according to the invention shown in Fig. 
4 . Suppose there are multiple media files such as popular 
song 1, popular song 2, popular song 3, ... (that is, media 
file group (b)), and the file names are given as shown in 

Fig. 5A and B as Songl, Song2 , Song3 , Slide components 

Songl-Sum, Song2-Sum, Song3-Sum, are given as shown in 

Fig. 5A, B as time codes corresponding to each file, and 
the separate slide components Songl-Sum, Song2-Sum, 

Song3-Sum, are present as external files, respectively. 

In these slide components, the location (file path + file 
name, etc.) of the original media file (herein, each song) 
to which the slide component belongs is described as the 
"media location." 

An example of actual description of slide summary in 
this case is shown in Fig. 5C. The slide summary of the 
contents is usually played continuously and sequentially 
as Songl_sum, Song2_sum, Song3_sum, . . . , but when transition 



to the original content is signaled during playback of a 
certain slide component (for example, Song2_sum) / it is 
transferred to the file (Song2) indicated by the file 
described as the media location/ the corresponding file can 
5 be played from the beginning. Also, during playback of 
original file, if transition to slide summary is signaled 
again , or when the playback of the original file is terminated , 
the playback of the slide summary is started again from the 
slide described next to the slide summary at the origin of 
10 transition. This operation is the same as shown in Fig. 
3. 

Fig. 6A through C show a modified example of the 
embodiment in Fig. 5. In the modified example, as shown 
in Fig. 6B, each slide component of slide is given as one 

15 composite file, and the file name is given as SongAll_sum. 
Similar to the example in Fig. 5, the location (file path 
+ file name, etc. ) of the original media file to which the 
slide component belongs (herein, each song) is described 
as the "media location. " Fig. 6C shows an example of actual 

20 description of slide summary in the above case. The slide 
summary of this content is usually played continuously and 
sequentially as 00:00 to 00:10, 00:10 to 00:25, 00:25 to 
00:40, ... of SongAll_sum, but, as shown in Fig. 6B, when 
playback start p of the original content is signaled during 

25 playback of slide component (for example, 00:10 to 00:25 
of SongAll_sum) , the operation is transferred to the file 



(Song2) indicated by the file name described as media 
location, so that the corresponding file can be played from 
the beginning. 

Fig. 7 shows an summary of a browsing device according 
5 to the invention. As shown in Fig. 7, when a slide summary 
playback button 11 is turned on, the audio/video slide 
summary is played. During playback of slide summary, for 
example, if an original content attribute display button 
12 is pressed, and display of attributes (title, file name, 

10 etc. ) of the original file is signaled, the description data 
about the original file (for example, title, file name) 
can be displayed in a character data display unit 14. 

On the other hand, when an original content playback 
start button 13 is pressed during playback of slide summary 

15 and start of playback of original content is signaled, the 
segment of the original content or file relating to the slide 
summary can be played in a video data display unit 15. 

Thus, in the invention, in addition to the data 
specifying the slide component belongs to which original 

20 content, the temporal segment such as shot/scene to which 
each slide component belongs is described, or the identifier 
(file name, etc.) is described if each slide component 
belongs to each different file, so that it is possible to 
reproduce alone the shot or scene to which the played slide 

25 belongs during playback of slide summary. Hence, an 
advanced audio/video slide summary can be presented. 



As clearly shown from the description herein, according 
to the invention, the description relating to the link 
between the original audio/video contents and slide 
components can be included in the description of the slide 
5 components of slide summary of audio/video data. It is also 
possible to describe the slide summary relating to the 
multiple files, it is further possible to transfer to the 
original content (temporal segment or file) of the slide 
component relating to the slide component, and hence it is 
10 effective to realize fast and advanced browsing of 

audiovisual data when grasping the summary of the audiovisual 
data . 
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